Systems and methods for consumer-generated media reputation management

ABSTRACT

TruCast is a method for management, by way of gathering, storing, analyzing, tracking, sorting, determining the relevance of, visualizing, and responding to all available consumer generated media. Some examples of consumer generated media include web logs or “blogs”, mobile phone blogs or “mo-blogs”, forums, electronic discussion messages, Usenet, message boards, BBS emulating services, product review and discussion web sites, online retail sites that support customer comments, social networks, media repositories, and digital libraries. Any web hosted system for the persistent public storage of human commentary is a potential target for this method. The system is comprised of a coordinated software and hardware system designed to perform management, collection, storage, analysis, workflow, visualization, and response tasks upon this media. This system permits a unified interface to manage, target, and accelerate interactions within this space, facilitating public relations, marketing, advertising, consumer outreach, political debate, and other modes of directed discourse.

PRIORITY CLAIM

The following application is a continuation of and claims priority toU.S. patent application Ser. No. 11/745,390 filed May 7, 2007, whichclaims priority to and the benefit of U.S. Provisional Application Ser.No. 60/746,621 filed May 5, 2006, U.S. Provisional Application Ser. No.60/861,406 filed Nov. 27, 2006, and U.S. Provisional Application Ser.No. 60/903,810 filed Nov. 27, 2006. Each of the foregoing applicationsare hereby incorporated by reference in their entirety as if fully setforth herein.

This application also a continuation-in-part of and claims priority toU.S. patent application Ser. No. 12/251,370 filed Oct. 14, 2008 and PCTApplication Serial Number PCT/US08/79885 filed Oct. 14, 2008 both ofwhich claim priority to and the benefit of U.S. Provisional ApplicationSer. No. 60/998,730 filed Oct. 11, 2007; U.S. Provisional ApplicationSer. No. 61/003,144 filed Nov. 13, 2007; U.S. Provisional ApplicationSer. No. 61/072,776 filed Apr. 1, 2008; and U.S. Patent Application Ser.No. 61/126,061 filed Apr. 29, 2008. This application also acontinuation-in-part of and claims priority to U.S. patent applicationSer. No. 12/192,919 filed Aug. 15, 2008 and PCT Application SerialNumber PCT/US08/73401 filed Aug. 15, 2008 both of which claim priorityto and the benefit of U.S. Provisional Application Ser. No. 60/965,067filed Aug. 15, 2007 and U.S. Provisional Application Ser. No. 60/956,097filed Aug. 15, 2007. This application is also a continuation-in-part ofU.S. patent application Ser. No. 12/580,667 filed Oct. 16, 2009 whichclaims priority to and the benefit of U.S. Provisional Application Ser.No. 61/106,134 filed Oct. 16, 2008, U.S. Provisional Application Ser.No. 61/147,057 filed Jan. 23, 2009, and U.S. Provisional ApplicationSer. No. 61/241,132 filed Sep. 14, 2009. All of which are incorporatedby reference in their entirety as if fully set forth herein.

COPYRIGHT NOTICE

This disclosure is protected under United States and InternationalCopyright Laws. © 2006-2010 Visible Technologies. All Rights Reserved. Aportion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure after formal publication by the USPTO, as itappears in the Patent and Trademark Office patent file or records, butotherwise reserves all copyright rights whatsoever.

BACKGROUND OF THE INVENTION

As used herein, the term “Consumer Generated Media” (hereinafter CGM) isa phrase that describes a wide variety of Internet web pages or sites,which are sometimes individually labeled as web logs or “blogs”, mobilephone blogs or “mo-blogs”, video hosting blogs or “vlogs” or “vblogs”,forums, electronic discussion messages, Usenet, message boards, BBSemulating services, product review and discussion web sites, onlineretail sites that support customer comments, social networks, mediarepositories, audio and video sharing sites/networks and digitallibraries. Private non-Internet information systems can host CGM contentas well, via environments like Sharepoint, Wiki, Jira, CRM systems, ERPsystems, and advertising systems. Other acronyms that describe thisspace are CCC (consumer created content), WSM (weblogs and socialmedia), WOMM (Word of Mouth Media) or OWOM, (online word of mouth), andmany others.

As used herein, the term “Keyphrase” refers to a word, string of words,or groups of words with Boolean modifiers that are used as models fordiscovering CGM content that might be relevant to a given topic. Couldalso be an example image, audio file or video file that hascharacteristics that would be used for content discovery and matching.

As used herein, the term “Post” refers to a single piece of CGM content.This might be a literal weblog posting, a comment, a forum reply, aproduct review, or any other single element of CGM content.

As used herein, the term “Site” refers to an Internet site whichcontains CGM content.

As used herein, the term “Blog” refers to an Internet site whichcontains CGM content.

As used herein, the term “Content” refers to media that resides on CGMsites. CGM is often text, but includes audio files and streams(podcasts, mp3, streamcasts, Internet radio, etc.) video files andstreams, animations (flash, java) and other forms of multimedia.

As used herein, the term “UI” refers to a User Interface, that usersinteract with computer software, perform work, and review results.

As used herein, the term “IM” refers to an Instant Messenger, which is aclass of software applications that allow direct text basedcommunication between known peers.

As used herein, the term “Thread” refers to an “original” post and allof the comments connected to it, present on a blog or forum. Adiscussion thread holds the information of content display order, sothis message came first, followed by this, followed by this.

As used herein, the term “Permalink” refers to a URL which persistentlypoints to an individual CGM thread

The Internet and other computer networks are communication systems. Thesophistication of this communication has improved and the primary modesdifferentiated over time and technological progress. Each primary modeof online communication varies based on a combination of three basicvalues: privacy and persistence and control. Email as a communicationsmedium is private (communications are initially exchanged only betweennamed recipients), persistent (saved in inboxes or mail servers) butlacks control (once you send the message, you can't take it back, oredit it, or limit re-use of it). Instant messaging is private, typicallynot persistent (some newer clients are now allowing users to savehistory, so this mode is changing) and lacks control. Message boards arepublic (typically all members, and often all Internet users, can accessyour message) persistent, but lack control (they are typically moderatedby a central owner of the board). Chat rooms are public (again, some aremembership based) typically not persistent, and lack control.

privacy persistence author control Chat Rooms/IRC no no no InstantMessaging yes no no Forums no yes no Email yes yes no Blogs no yes yessocial networks yes/no yes yes Second Life yes yes yes+

Blogs and Social Networks are the predominant communications mediumsthat permit author control. By reducing the cost, technicalsophistication, and experience required to create and administer a website, blogs and other persistent online communication have given anunprecedented amount of editorial control to millions of online authors.This has created a unique new environment for creative expression,commentary, discourse, and criticism without the historical limits ofeditorial control, cost, technical expertise, or distribution/exposure.

There is significant value in the information contained within thispublic media. Because the opinions, topics of discussion, brands andcelebrities mentioned and relationships evinced are typically totallyunsolicited, the information presented, if well studied, represents anamazing new source of social insight, consumer feedback, opinionmeasurement, popularity analysis and messaging data. It also representsa fully exposed, granular network of peer and hierarchical relationshipsrich with authority and influence. The marketing, advertising, and PRvalue of this information is unprecedented.

This new medium represents a significant challenge for interestedparties to comprehensively understand and interact with. As of Q1 2007estimates for the number of active, unique online CGM sites (forums,blogs, social networks, etc.) range from 50 to 71 million, with growthrates in the hundreds of thousands of new sites per day. Compared to thetypical mediums that PR, Advertising and Marketing businesses anddivisions interact with (<1000 TV channels, <1000 radio stations, <1000major news publications, <10-20 major pundits on any given subject,etc.) this represents a nearly 10,000-fold increase in the number ofpotential targets for interaction.

Businesses and other motivated communicators have come to depend onsoftware that perform Business Intelligence, Customer RelationshipManagement, and Enterprise Resource Planning tasks to facilitateaccelerated, organized, prioritized, tracked and analyzed interactionwith customers and other target groups (voters, consumers, pundits,opinion leaders, analysts, reporters, etc.). These systems have beenextended to facilitate IM, E-mail, and telephone interactions. Thesemedia have been successfully integrated because of standards (jabber,pop3, smtp, pots, imap) that require that all participant applicationsconform to a set data format that allows interaction with this data in apredictable way.

Blogs and other CGM generate business value for their owners, both onprivate sites that use custom or open source software to manage theircommunications, and for massive public hosts. Because these sites cangenerate advertising revenue, there is a drive by author/owners toprotect the content on these sites, so readers/subscribers/peers have tovisit the site, and become exposed to revenue generating advertising, inorder to participate in/observe the communication. Because of thisfinancial disincentive, there is no unifying standard for blogs whichcontains complete data. RSS and Atom feeds allow structuredcommunication of some portion of the communication on sites, but areoften very incomplete representations of the data available on a givensite. Sites also protect their content from being “stolen” by automatedsystems with an array of CAPTCHAs, (“Completely Automated Public Turingtest to tell Computers and Humans Apart”) email verification, mobilephone text message verification, password authentication, cookietracking, Uniform Resource Locator (URL) obfuscation, timeouts andInternet Protocol (IP) address tracking.

The result is a massively diverse community that it would be veryvaluable to understand and interact with, which resists aggregation andunified interaction by way of significant technical diversity,resistance to complete information data standards, and tests thatattempt to require one-to-one human interaction with content.

SUMMARY OF THE INVENTION

TruCast is a method for management, by way of gathering, storing,analyzing, tracking, sorting, determining the relevance of, visualizing,and/or responding to all available consumer generated media. Someexamples of consumer generated media include web logs or “blogs”, mobilephone blogs or “mo-blogs”, forums, electronic discussion messages,Usenet, message boards, BBS emulating services, product review anddiscussion web sites, online retail sites that support customercomments, social networks, media repositories, and digital libraries.Any web hosted system for the persistent public storage of humancommentary is a potential target for this method. The system iscomprised of a coordinated software and hardware system designed toperform management, collection, storage, analysis, workflow,visualization, and response tasks upon this media. This system permits aunified interface to manage, target, and accelerate interactions withinthis space, facilitating public relations, marketing, advertising,consumer outreach, political debate, and other modes of directeddiscourse.

BRIEF DESCRIPTION OF THE DRAWINGS

The preferred and alternative embodiments of the present invention aredescribed in detail below with reference to the following drawings.

FIGS. 1A-1B shows an example system for consumer generated mediareputation management; and,

FIG. 2 shows a method for consumer generated media reputationmanagement;

FIG. 3 shows a incoming data from collection being delivered to aningestion system in one embodiment;

FIG. 4 is a depiction of one embodiment of a CGM site discovery system;

FIG. 5 provides an overview of ingestion in one embodiment;

FIG. 6 shows manual scoring in one embodiment;

FIGS. 7-9 show the smooth transition between user scoring and automatedscoring, in one embodiment;

FIG. 10 is a depiction of one embodiment of a CGM response engine;

FIGS. 11-13 show screen shots of a registration and response feature;

FIG. 14 shows an example screenshot of the TruCast Login Authenticationscreen;

FIG. 15 shows an example screenshot of a user interface homepage;

FIG. 16 shows an example screenshot of an account manager panel;

FIG. 17 shows an example screenshot of a user manager panel;

FIG. 18 shows an example screenshot of a topic manager panel;

FIG. 19 shows an example screenshot of a topic manager panel with thekeyphrase tab activated;

FIG. 20 shows an example screenshot of a sorting manager;

FIG. 21 shows an example screenshot of the sorting manager with the usertab activated;

FIG. 22 shows an example screenshot of a scoring manager;

FIG. 23 shows an example screenshot of a scoring manager with a newtopic creator screenshot activated;

FIG. 24 shows an example screenshot of a response manager;

FIG. 25 shows an example screenshot of an administrative queue;

FIG. 26 shows an example screenshot of a dashboard launcher;

FIG. 27 shows an example screenshot of an impact dashboard;

FIG. 28 shows an example screenshot of a sentiment dashboard;

FIG. 29 shows an example screen shot of a sentiment history dashboard;

FIG. 30 shows an example screenshot of an authority map dashboard;

FIG. 31 shows an example screenshot of a data drilldown dashboard;

FIG. 32 shows an example screenshot of an ecosystem map dashboard;

FIG. 33 shows an example screenshot of an ecosystem map zoom out view;

FIG. 34 shows an example screenshot of a sentiment summary;

FIG. 35 shows an example screenshot of a set of top lists;

FIG. 36 shows an example screenshot of reporting;

FIG. 37 shows an example screenshot of an aggregate performancedashboard; and

FIG. 38 shows a system overview in detail.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

FIG. 1A illustrates an example of a suitable computing systemenvironment 100 on which an embodiment of the invention may beimplemented. The computing system environment 100 is only one example ofa suitable computing environment and is not intended to suggest anylimitation as to the scope of use or functionality of embodiments of theinvention. Neither should the computing environment 100 be interpretedas having any dependency or requirement relating to any one orcombination of components illustrated in the exemplary operatingenvironment 100.

Embodiments of the invention are operational with numerous othergeneral-purpose or special-purpose computing-system environments orconfigurations. Examples of well-known computing systems, environments,and/or configurations that may be suitable for use with embodiments ofthe invention include, but are not limited to, personal computers,server computers, hand-held or laptop devices, multiprocessor systems,microprocessor-based systems, set-top boxes, programmable consumerelectronics, network PCs, minicomputers, mainframe computers,distributed-computing environments that include any of the above systemsor devices, and the like.

Embodiments of the invention may be described in the general context ofcomputer-executable instructions, such as program modules, beingexecuted by a computer. Generally, program modules include routines,programs, objects, components, data structures, etc. that performparticular tasks or implement particular abstract data types.Embodiments of the invention may also be practiced indistributed-computing environments where tasks are performed by remoteprocessing devices that are linked through a communications network. Ina distributed-computing environment, program modules may be located inboth local- and remote-computer storage media including memory storagedevices.

With reference to FIG. 1A, an exemplary system for implementing anembodiment of the invention includes a computing device, such ascomputing device 100. In its most basic configuration, computing device100 typically includes at least one processing unit 102 and memory 104.

Depending on the exact configuration and type of computing device,memory 104 may be volatile (such as random-access memory (RAM)),non-volatile (such as read-only memory (ROM), flash memory, etc.) orsome combination of the two. This most basic configuration isillustrated in FIG. 1A by dashed line 106.

Additionally, device 100 may have additional features/functionality. Forexample, device 100 may also include additional storage (removableand/or non-removable) including, but not limited to, magnetic or opticaldisks or tape. Such additional storage is illustrated in FIG. 1A byremovable storage 108 and non-removable storage 110. Computer storagemedia includes volatile and nonvolatile, removable and non-removablemedia implemented in any method or technology for storage of informationsuch as computer-readable instructions, data structures, program modulesor other data. Memory 104, removable storage 108 and non-removablestorage 110 are all examples of computer storage media. Computer storagemedia includes, but is not limited to, RAM, ROM, EEPROM, flash memory orother memory technology, CD-ROM, digital versatile disks (DVD) or otheroptical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other medium which canbe used to store the desired information and which can be accessed bydevice 100. Any such computer storage media may be part of device 100.

Device 100 may also contain communications connection(s) 112 that allowthe device to communicate with other devices. Communicationsconnection(s) 112 is an example of communication media. Communicationmedia typically embodies computer-readable instructions, datastructures, program modules or other data in a modulated data signalsuch as a carrier wave or other transport mechanism and includes anyinformation delivery media. The term “modulated data signal” means asignal that has one or more of its characteristics set or changed insuch a manner as to encode information in the signal. By way of example,and not limitation, communication media includes wired media such as awired network or direct-wired connection, and wireless media such asacoustic, radio-frequency (RF), infrared and other wireless media. Theterm computer-readable media as used herein includes both storage mediaand communication media.

Device 100 may also have input device(s) 114 such as keyboard, mouse,pen, voice-input device, touch-input device, etc. Output device(s) 116such as a display, speakers, printer, etc. may also be included. Allsuch devices are well-known in the art and need not be discussed atlength here.

Referring now to FIG. 1B, an embodiment of the present invention can bedescribed in the context of an exemplary computer network system 200 asillustrated. System 200 includes an electronic client device 210, suchas a personal computer or workstation, that is linked via acommunication medium, such as a network 220 (e.g., the Internet), to anelectronic device or system, such as a server 230. The server 230 mayfurther be coupled, or otherwise have access, to a database 240 and acomputer system 260. Although the embodiment illustrated in FIG. 1Bincludes one server 230 coupled to one client device 210 via the network220, it should be recognized that embodiments of the invention may beimplemented using one or more such client devices coupled to one or moresuch servers.

In an embodiment, each of the client device 210 and server 230 mayinclude all or fewer than all of the features associated with the device100 illustrated in and discussed with reference to FIG. 1A. Clientdevice 210 includes or is otherwise coupled to a computer screen ordisplay 250. As is well known in the art, client device 210 can be usedfor various purposes including both network- and local-computingprocesses.

The client device 210 is linked via the network 220 to server 230 sothat computer programs, such as, for example, a browser, running on theclient device 210 can cooperate in two-way communication with server230. Server 230 may be coupled to database 240 to retrieve informationtherefrom and to store information thereto. Database 240 may include aplurality of different tables (not shown) that can be used by server 230to enable performance of various aspects of embodiments of theinvention. Additionally, the server 230 may be coupled to the computersystem 260 in a manner allowing the server to delegate certainprocessing functions to the computer system.

In one embodiment, the methods and systems are implemented by acoordinated software and hardware computer system. This system iscomprised of a set of dedicated networked servers controlled by TruCast.The servers are installed with a combination of commercially availablesoftware, custom configurations, and custom software. A web server isone of those modules, which exposes a web based client-side UI tocustomer web browsers. The UI interacts with the dedicated servers todeliver information to users. The cumulative logical function of thesesystems results in a system and method referred to as TruCast.

In alternate embodiments, the servers could be placed client side, couldbe shared or publicly owned, could be located together or separately.The servers could be the aggregation of non-dedicated compute resourcesfrom a Peer to Peer (P2P), grid, or other distributed network computingenvironments. The servers could run different commercial applications,different configurations with the same or similar cumulative logicalfunction. The client to this system could be run directly from theserver, could be a client side executable, could reside on a mobilephone or mobile media device, could be a plug-in to other Line ofBusiness applications or management systems. This system could operatein a client-less mode where only Application Programming Interface (API)or eXtensible Markup Language (XML) or Web-Services or other formattednetwork connections are made directly to the server system. Theseoutside consumers could be installed on the same servers as the customapplication components. The custom server-side engine applications couldbe written in different languages, using different constructs,foundations, architectural methodologies, storage and processingbehaviors while retaining the same or similar cumulative logicalfunction. The UI could be built in different languages, using differentconstructs, foundations, architectural methodologies, storage andprocessing behaviors while retaining the same or similar cumulativelogical function.

FIG. 2 shows a method for consumer generated media reputationmanagement. The TruCast system can be broken down into elements, theelements are, but are not limited to the following: collection,ingestion, analysis, reporting and response.

Collection

In one embodiment, the Collection system gathers the majority ofinformation about all CGM content online. This is a weighted,prioritized goal because TruCast functions in a weighted, prioritizedway. This prioritization system is an optionally advantageous element ofthe collection system, called the Collection Manager. The CollectionManager receives input from internal and external sources about whatsites have information of value, weights that information against a setof pre-described and manipulatable co-factors to allow tuning, andprioritizes the execution of collection against those sites.

In order to collect data from a blog site, an automated web scriptingand parsing system called a robot is built. An individual “robot” is asophisticated, coordinated script which informs a software engine of howto navigate, parse, and return web information. Every web site iscomprised of code in one of several popular languages, which softwareapplications called web browsers “render” or convert to a visuallyappealing “web site”. A robot, similar to a browser, interprets sitecode to render an output. The desired output is not the “web site” thata browser would create, but an XML document, with columns of informationabout the content stored on a given site. Because robots are accessingthe code, and not the rendered page, they have access to markupstructures in the code which identify where specific content of interestis stored within the code. Robots use navigation based on DocumentObject Model (DOM) trees, regular expression pattern matching,conditional parsing, pre-coded transformations, mathematical and logicalrules, tags, comments, formatting, and probability statistics to extractthe specific content TruCast, in one embodiment, uses from raw web sitecode. Functions which perform this parsing are abstracted and codifiedin the robot engine, which is instructed on specific actions by aspecific robot script. In pseudo-code, a robot designed to gather all ofthe blog content on a wordpress site would be scripted thusly: Load XURL, read code until “<bodytext>” is found, return all text until“</bodytext>” is found. If it is found create row 1, store this text incolumn A row 1. Find link with the word “next” in it, follow this link.Read code until “<bodytext>” is found, return all text until“</bodytext>” is found. If it is found create row 2, store this text incolumn A row 2.

This is a clearly incomplete example, as a plurality of robots have theability to gather and transform a very complete set of knowableinformation from every website visited, including the full body text,author's name, date of the post, permalink to the post, title of thepost, it's position on the page, how many comments it has, the fullinformation about those comments, including author, date, order, body,any hyperlinks, graphics, scripts, emoticons, or other multimedia filesincluded in a post, comment or site. Robots can be designed to gatherdata from only an individual site, or made more general to accommodatevariation amongst similar sites. Robots parse the gamut non-structuredweb site code into XML encoded text that meets a predefined dataspecification of the design. The system, in one embodiment, collects allposts, all comments, and all desired content from every page that arobot visits.

Robots are not limited to these methods for content parsing hierarchicaltemporal memory analysis, probability-based positive heuristics, andstructural inference technologies can be used to make robots are capableof collecting information from a wider variety of sites.

Some sites have full-data RSS or Atom feeds (different than thetypically truncated feeds), for which a specific set of robots exist.The system also has data vendors who deliver full-data feeds in severalformats, these feeds are converted to the XML data spec by another classof robots. Robots are not limited to web content collection, butrepresent a scriptable system for parsing and transforming incoming andoutgoing data based on pre-defined rules.

FIG. 3 depicts one embodiment of a CGM data collection system. In oneembodiment, the first step of this system is to prioritize possibletargets for collection. Inputs to this prioritization include, but arenot limited to, sites specifically requested by customers (305) and thenumber of responses the system is written to a given site (310), thenumber of accounts that find content from this site relevant (315), thetotal count of relevant content available on the site (320), the date ofthe most recent post written on the site (325) and the historicalperformance of the system at gathering content from this site (330). Thepriority database maintains an updated list of co-factors which arecalculated priorities for each site based on these inputs. When theCollection manager (340) determines that it has excessbandwidth/resources to execute more robots, it polls the prioritydatabase (335) to determine which robots (345) and then executes them.The collection manager also stores the records of robot activity so thatit can add this information to the priority database (335). Robots, oncelaunched by the Collection Manager, interface with their targets (350)to return XML-formatted CGM content to the Ingestion system (355).

FIG. 4 is a depiction of one embodiment of a CGM site discovery system.Site discovery is the process of finding the URLS of new CGM sites onthe Internet. The coordination is performed by the Discovery RobotManager (372). This system retains performance information of the threemethods, and determines what percentage of available resources (cputime, bandwidth) to spend running each of the three methods in order todiscover the most new URLs possible. The Discovery Robot Managerreceives input from the Discovery Targets DB (370) which stores all ofthe information to execute each of the three methods, most notably theURL targets for each method. This system is fed information fromcustomer or internal research discovered URLs (362) URLs of known searchengines (364) URLs found in the post bodies of CGM content (366) and theURLs of the directory pages for each of the major blog hosts (368). Eachmethod uses this information and a script for web interaction, called arobot, to discover new CGM URLs. The first method is called the “RealEstate” method. When the Discovery Robot Manager (372) determines thatit is efficient to do so, it will launch a Real Estate robot for aspecific search engine (374), and supply it with a list of keywords fromall account topics which is held in the Discovery Targets DB (370). Thisrobot will visit the search engine and fill in the search form with eachkeyword, and gather, by way of regular expression pattern extraction,the URLs of the results from the first 4 pages of results. Thisinformation will be delivered in XML format to the de-duplicator (388),which will eliminate known URLs, and then be stored in the CollectionPrioritization DB (390) for collection. The second method, Site Search,is very similar to the Real Estate method, uses the same robots, butbehaves in a different way with different input. The Real Estate robotsuse keywords from the topics in the accounts. The Site Search method hasa pre-determined list of keyphrases designed to be representative of thefull gamut of discussion on the web. The Discovery Robot Manager (372)collects this information from the Discovery Targets DB (370) andexecutes a Site Search robot, which searches the input keyphrases toretrieve the first 20 pages of results. Because of the much largernumber of searches, these robots are designed to heavily obfuscate andavoid patterned interaction with Search Engine servers. The URLsdiscovered by Site Search robots are delivered to the de-duplicator(388), and from there to the Collection Prioritization DB (390). SiteSearch robots can also alternately be sent input URLs that are blogsites instead of search engines. Within this context they will visitevery hyperlink on the site, searching for new links topreviously-unknown sites. This be delivered as new URL output similar tothe other methods. The third method, called Host Crawl, uses differentrobots to visit the directory listing pages on major CGM hostingengines. These directory pages' URLs are stored in the Discovery TargetsDB (370). The Discovery Robot Manager (372) launches a Host Crawl Robot(376) which visits a CGM Host directory page (382) and visits all of thehyperlinks on that page retrieving all of the URLs that are available.This information is sent to the de-duplicator (388) and on to theCollection Prioritization DB (390)

Ingestion

FIG. 5 depicts one embodiment of a data ingestion system. This systemreceives input from the XML data outputs of robots launched andadministered by the Collection Manager (400). These XML data sources arequeued in an Ingestion Queue (405). This queuing process is a bufferingfunction because all of the remaining steps are a stream processingmethod which requires a steady stream of content to work at maximumefficiency. Due to the dynamic nature of the volume of XML data input,the Ingestion queue holds a backlog of incoming data and outputs it at asteady rate, currently 500 docs/second. This flow of data is deliveredfirst to a system which compares incoming CGM content information to allpreviously collected content, based on posted date, permalink URL, andpost body to ensure that the data does not already exist in the system.This is the de-duplicator (410). Once this system has culled duplicatedocuments, it hands those documents to a UREF constructor (415) whichcreates a new uniqueID number to easily index and track unique contentwithin the system in one embodiment of the invention. Next, content isdelivered to a GMT time aligner, which converts all date and time stampsto be relative to Greenwich Mean Time (420). Next, this XML formatinformation is transformed using an XSLT (425) or eXtensible StyleLanugage Transformation processor, which reformats the data for rapiddelivery into the indexing system and relational DB systems (430). Inone embodiment, TruCast performs several cleaning and refining stepsupon incoming CGM content enclosed in the XML format. The systemeliminates duplicate content using a fuzzy logic comparison betweenexisting stored content and incoming new content based on post body,permalink, and date information. This comparison is tunable andweighted, where positive matches are clear indicators of duplication,but agreement is optionally advantageous across multiple values toconfirm duplication. For example, if two posts came from exactly thesame date and time to the second, it's unlikely, but possible, that theyare truly different unique posts. If, however, the body text is 90% thesame, and the URL is 90% the same, it's extremely unlikely that the twoposts are unique. On body text, this comparison includes text clusteringanalysis, to use word counts as a computationally inexpensive way tofurther evaluate uniqueness. Content that is malformed or incompleteaccording to the data spec is removed and warnings sent to theresponsible collection manager element. Once a document is determined tobe unique a UREF (unique reference) value is created and appended to itso that there is a relevant single value to index this informationwithin the system. All incoming post dates are aligned to GMT. In oneembodiment, TruCast delivers all prepared content into an indexingsystem which formats the data in such a way that it can be rapidlysearched based on relationships to other data, keyword presence, accountrelevance, and date. This structure includes storage of data within adistributed indexed data repository as well as several SQL databases.Each SQL database is optimized for a different consuming system: the UI,the visualization systems, the reporting and statistics systems, thecollection priority database, and the target discovery database, as wellas the individual account level data stores.

Analysis

In one embodiment, TruCast is designed to determine, with a high degreeof confidence, the conceptual relevance of a given piece of CGM contentto a “topic” or concept space. Topics can be of any breadth (“War” isjust as sufficient a topic as “2002 Chevy Silverado Extended Cab DoorHinge Bolt Rust”). Topics are abstract identifiers of relevanceinformation about a given piece of CGM content. Each topic can also beunderstood as a list of “keyphrases” or keywords with Boolean modifiers.Each topic can contain an unlimited number of keyphrases that work asthe first tier of pattern matching to identify content that is relevantto an individual account. Each post discovered by the system, and, inone embodiment, could be relevant to one topic, many topics, many topicsacross many accounts, or no topics at all.

FIG. 6 depicts one embodiment of a system for manually appending topicrelevance and topical sentiment to blog posts. This process begins bydiscovery of potentially relevant content by way of keyphrases.Keyphrases are grouped into topics. Topics and keyphrases are created byusers (455) in the Topic Manager panel (460) within the UI. Once a newtopic and keyphrase is created, this information is transmitted to theindexing system (465) which begins to examine all incoming data formatches against this keyphrase. The information is also handed to therelational database system (470) which is also the StoreDB component ofthe Historical Data Processor as illustrated in FIG. 38. This systemexamines all data that has already been processed to see if it matchesthis new keyphrase. This separation accelerates both processes becauseof optimized structure in (465) for stream processing and optimizedstructure in (470) for narrow, deep searches against a significantlylarger dataset. Information from both of these systems are passed inqueue form to the Scoring Manager (475) which provides a UI for users toannotate topic relevance and topic sentiment information which is storedin the relational DB (485). In one embodiment, TruCast contains a userinterface that allows users to create topics, create keyphrases that areused to search for potentially relevant posts for that topic, placepotentially relevant content into a queue for review, review the textand context of individual content, mark that content as relevant tonone, one, or many topics, (thereby capturing human judgment ofrelevance), and store that information in the relational database. Thissystem is called the Scoring Manager.

This method, where a post is matched by keyphrase, scored by humans, anddelivered to the outputs of TruCast, in one embodiment (visualizations,reports, and response), is a basic “manual” behavior of the system.

The behavior of this tiered system of relevance discovery and analysischanges over time to reflect the maturation of the more sophisticatedelements of the system as their contextual requirements are much higher.A keyphrase match is absolute, in one embodiment; if a post contains anappropriate keyphrase, there is no question as to if a match exists. TheConceptual Categorization system is built to apply a series ofexemplar-based prediction algorithms to determine the conceptualrelevance of a given post independent of exact keyphrase match. Thismakes the system, in one embodiment, more robust and provides morehuman-relevant information. In an exemplary embodiment a blog post bodyincludes the following text: “I really enjoy looking out my windows tosee the vista out in front of my house. Buena! It is so great! I wish mycomputer was so nice, it is a little broken edgy eft sadly.” (EX. 1)

A topic for the Microsoft Corporation, looking for the words “windowsvista computer” in order to find online discussion about their newoperating system would find this post by keyphrase match, despite thefact that the user discusses using “edgy eft” which is a code name forUbuntu 6.08, a competitor's operating system. A topic for MilgardWindows and Doors Corporation that is looking for discussion aboutwindows in need of repair would find this same post looking for thekeyphrase “broken house windows” despite the fact that clearly thewriter is enjoying looking out of his unbroken windows. The DisneyCorporation, looking for discussion about their film company “BuenaVista” would find this post, which has nothing to do with them at all. Abiologist researcher looking for references to immature red newts wouldsearch for “Eft” only to be sadly disappointed in another result aboutUbuntu's software. In all of these cases keyphrase matches have proveninsufficient to successfully match relevant content to interestedparties. Boolean modifiers help (vista NOT Buena) but consistently fallfar short of expectations, and require non-intuitive and time consumingresearch and expertise.

Automated Conceptual Categorization

FIGS. 7-9 show the smooth transition between user scoring and automatedscoring and depict the progression of the operation of one embodimentfor an automated categorization and sentiment analysis system. Thisprogression occurs from the early state, where the automated systemperforms poorly due to a lack of contextual examples, to a mature statewhere the automated system performs excellently as a result of robustcontextual examples. The system, in one embodiment, reacts to thisimprovement by reducing the rate of post queue delivery to users andincreasing the acceptance of analyzed posts from the automated system asconfidence ratings and exemplar set sizes increase. This process acceptsinput from the ingestion system (350) into two separate queues. Thefirst queue delivers content to the scoring manager (610) where it isscored by humans (615) and then delivered to the per-topic exemplar sets(620) based on topic relevance, the relational database (625) forstorage and use in the response, visualization and report sections, andto an agreement analysis system (645). A second queue delivers contentto the automated categorization system which accepts input from theper-topic exemplar sets, as well as topic performance and tuninginformation from the agreement analysis system (645). This system passesconceptually relevant content to the sentiment analysis systems whichalso has access to the exemplar and agreement analysis tuning data. Theautomated systems append a “confidence” score to their evaluations,which are used as a threshold to decide trust in the evaluation'saccuracy. In the early behavior of the system, due to the lack ofexamples and agreement analysis tuning data, often this confidence scoreis very low. As more manual scoring is completed, and agreement analysisimproves, the percentage of data flowing into the automated systemsincreases, and once performance is proven on the full data stream, theflow of data to the manual scoring application begins to decrease.Continual tracking of the agreement analysis system tracks for thevarying level of inaccuracy that the automated systems can create as aresult of changes within topical vernacular, user vocabulary, or newcommon phrases, inflections, or other changes in the typical wordpatterns present in incoming CGM content are reflected by the dynamicadjustment of the percentages of data flowing into these two systems.Over time, given sufficient, accurate scoring by humans, the automatedsystems should be capable of accurate analysis on 100% of incomingdocuments, which would reduce the role of required human interaction toonly providing audit and contemporary vernacular updates by way ofminimal scoring. In one embodiment, TruCast, contains a ConceptualCategorization system which has functionality to evaluate posts forrelevance by way of statistical analysis on examples provided by humansusing the scoring system. Because humans are reviewing the content, froma specific customer's perspective, that content is reliably scored incontext. If the above example post (EX. 1) was scored by a human scorerfor Microsoft, it would be found irrelevant to the Windows Vistaoperating system. By statistical analysis of hundreds of posts markedrelevant or irrelevant to individual topics, the system can utilize notjust keywords, but the entire body of the post to determine relevance.This statistics calculation leverages text clustering assisted by stopwords exclusion, noun and pronoun weighting, punctuation observation,and stemming near-word evaluations. For non-text categorizationanalysis, TruCast, in one embodiment can leverage Optical CharacterRecognition (OCR) image to text conversion, Fast-Fourier Transform (FFT)and Granular Synthesis (GS) analysis based speech-to-text conversion, aswell as Hierarchical Temporal Memory (HTM) processing. This comparison,and the resultant threshold filtered probability that a given post isrelevant to a given topic allows TruCast, in one embodiment, to assignthis meta-information. This method is vastly more accurate to humananalysis than keyphrase matching. It also has the optionallyadvantageous feature of being continually tuned by ongoing scoringwithin the UI, which provides fresh exemplar data over time.

Automated Sentiment Analysis

When users score content for relevance in the scoring manager, they alsomay assert the sentiment of the content for each topic that it isrelevant, from the perspective of their account. Users will mark, fromtheir perspective (as informed by a set of scoring rules described byuser administrators) the sentiment reflected about each topic. Thisinformation will be stored for later use in a relational database.

These human markup actions serve two purposes. First is to capture thisdata for direct use within a response system, and a series of datavisualizations that leverage topic and sentiment information toelucidate non-obvious information about the content TruCast collects, inone embodiment. This is the “manual” path for data to flow thru thesystem, in one embodiment. The second use for these posts is that theyserve as example data for an exemplar driven automated sentimentanalysis system that mirrors the conceptual categorization system.

Similar to the process of categorization, the system, in one embodiment,leverages an exemplar set of documents to perform an automatedalgorithmic comparison in order to determine the sentiment, per topic,contained within an individual post. This requires a larger number ofexamples than categorization analysis, (˜100 per sentiment value pertopic) due to the four different stored sentiment values, “good”, “bad”,“neutral” and “good/bad”. Due to the significant complexity of sentimentlanguage within human language, additional processing is performed uponeach document to improve the accuracy of the analysis. A lexicon ofsentimental terms is stored within the system, and their presence has aweighted impact on the analysis. Negation terms and phrase structuresalso alter the values associated with sentimental terms. A stop wordslist eliminates connective terms, object nouns, and othernon-sentimental terms within the text, reducing the noise the comparisonhas to filter thru. Sentence detection uses linguistic analysis tosubdivide posts into smaller sections for individual analysis. A seriesof algorithms are compared for accuracy and performance on a per topicbasis, to allow the performance of the analysis system to be tuned toeach topic.

Automated Analysis Management

Both of these processes work upon the post-ingestion content, directingautomatically analyzed documents into the remainder of the systemworkflow. This process reacts to the number of exemplar documents thatare available. If incoming content is keyphrase-relevant to a specifictopic, a determination is made if sufficient exemplar documents havebeen gathered by the system from users. If enough exemplary documentsare not available, that post is delivered to the scoring queue whichfeeds content to the scoring manager interface. If some documents arepresent as exemplars, the system will attempt automated categorizationand sentiment analysis, but still deliver posts to the scoring manager.This creates a pair of analysis results, one from the computer and onefrom the user. These are compared, and when a sufficient alignment(agreement frequency) is reached, the system starts deliveringauto-analyzed content directly to the reporting and response systems,saving human effort.

This is a sliding ratio from 100% being delivered to the UI and 0% beingauto-analyzed, to only 1-10% being delivered to the UI and 100% beingauto-analyzed. Once the ratio of content being reviewed by human scorersreaches 10%, and accurate performance of the automated analysis ismaintained, mature operation of the automated systems has been achieved.This is the most efficient operation of the system, in one embodiment.

The system utilizes an aging and auditing system to ensure that theoldest human scored posts are ejected from the exemplar set and replacedby new human scored posts over time. The system also performs internalcluster analysis and ejects significant outliers from the system. Bothof these processes are tunable by administrative control panels. Theresult of this aging and auditing should be that as the vernacular, wordusage, and issues discussed internal to a given topic change over time,exemplar documents continue to reflect that change and accurately maprelevance.

Reporting

The system, in one embodiment, of databases which receive topicrelevant, analyzed content is connected to a series of web-basedvisualizations to allow users of the UI to understand valuableinformation about the discussions captured by the system, in oneembodiment. Visualizations are shown in FIGS. 27-38.

Response

FIG. 10 is a depiction of one embodiment of a CGM response engine. Inthis embodiment the Response Manager UI (752) is populated with awritten response by a user (758). This user is evaluated forauthorization permissions against a stored value in the Account Database(754). If the user does not have appropriate authorization, theirresponse will be delivered to an authorization queue (756) to beapproved by an administrator. If a response is not approved it isdeleted. If a responder has authorization, or their response isapproved, it will be delivered to the Response Priority Processor (760)which determines if any delay or promotion is required for a givenapproved post. It also observes the original posted date of the contentthat is being responded to and prioritizes based on most recent posteddates. The Response Engine Manager (764) requests responses from theResponse Priority Processor (760) to deliver to the registration andresponse robots. The Response engine manager checks the responseperformance DB (766) to see if a given URL has a response robot that hasalready been created or not. If it has not, the response and allassociated information is sent to the Response Robot Constructor (772).This tool provides an interactive UI to allow semi-automated interactionwith a target CGM site's registration and response systems to deliverthe response to the site, and record the interaction. These interactionsinclude loading pages, following hyperlinks, assigning input data tosite form fields, navigating to web mail systems for authenticationmessages, completing CAPTCHA tests, interacting with IM and SMS systems,performing sequential interactions in correct order and submittingforms. The result of these actions should be a newly registered user (ifrequired by the site) and a response written to the blog site. Theinteraction is recorded and stored in the Registration and ResponseRobot sets (770, 774). If, when the Response Engine Manager is sent aresponse, it determines that a robot already exists, it will executethat robot without human interaction. This has the same effect, creatinga new registration if required, and writing the response to the CGMsite. Success or failure of robots and robot constructor actions arerecorded in the Response Performance DB for evaluation and manual codere-work if required.

The response manager is a system to convert into a manageable, scalablebusiness process the task of responding to CGM content by way ofcomments. All CGM systems that allow interactivity (>90%) have a webbased system for allowing readers of content to respond by way of acomment, note, or other stored message. This often requires that usersregister themselves on the site, by providing a username, password, andother personal details. Sometimes this requires providing an e-mailaddress, to which an activation link is sent, or an instant messengeraccount which is sent a password. This isn't too difficult for casualusers to maintain, especially if they only interact with a few sites.Professional users however often have to interact with thousands ofdifferent sites. The system, in one embodiment, aims to reduce thisworkload for responders by automating the registration and responseprocess.

Response Workflow

In one embodiment, the TruCast UI system facilitates a workflow for manyusers to interact in a coordinated, managed way with CGM content. Once apost as been successfully analyzed by either a user in the scoringmanager, or the automated analysis systems it becomes available withinthe response manager. This is a UI system for a user to write a commentin response to relevant posts. The UI two halves, one which showsinformation about the post being responded to (author, date, body text,and other comments from within the thread, as well as stats about theauthor and site responsible for the content.), and the second thatcontains the new response the user is writing. The system provides aninterface called the response vault for managers to pre-write messagecomponents, fragments of text, names, stats, and pieces of argument thatthey'd like responders to focus on. These snippets can be copied intothe response body during authoring. Once a user is done writing aresponse, the can click a “send” button which delivers the newly writtenresponse to the relational database.

Response Automation

FIGS. 11-13 show screen shots of a registration and response feature.Once the system, in one embodiment, receives a response record from theresponse manager, it determines which blog site contains the originalmessage, and the link to the response page for that site and message. Ifthe system, in one embodiment, has never written a response to that sitebefore, the record is delivered to the response interactor UI orResponse Robot Constructor, which is run by company employees. This UIallows an employee to visit the appropriate site, navigate to theappropriate fields, and assign the information from the record to fieldson the site that will cause the site to record a response. This actionis recorded, and converted into a script, which is stored as a new robotfor later re-use. If TruCast has already written a response to a givensite, this script will be used eliminating the need for repeated humaninteraction.

This system utilizes a similar engine and scripting methodology as thecollection system. Registration and Response robots are scriptedautomations, which interpret the code of CGM content pages, web pages,pop3 or web based e-mail systems, and other data structures, and performpre-determined, probabilistic, or rule driven interactions with thosestructures. By interpreting page code and scripted instructions, theycan imitate the actions of human users of these structures, by executingon screen navigation functions, inserting data, gathering data, andreporting success or failure. An example registration robot would begiven as a data input the registration information for an individualuser of the system, in one embodiment, and given the URL to a site thatthe user wishes to register on. The robot would visit the site, navigateby markers pre-identified in the page code to the appropriate formlocations to insert this information, confirm it's insertion, and reportsuccess, as well as any output information from the site. An exampleresponse robot would accept as input the registration information for agiven user of the system, in one embodiment, the blog response they'vewritten, and the URL to the site that the user wishes to respond to. Therobot would load the site into memory, navigate the page by way ofhyperlinks or pre-determined, probabilistic or rule driven information,examine the page source code to discover the appropriate form fields toinsert this input data into, do so, and report success. Otherembodiments of this solution could include purpose built scripts thatperform the same assignment and scripted interaction with CGM sites toperform registration and response tasks. Smaller scale systems wouldhave users perform the manual field entry and navigation tasks, butcaptures these interactions for conversation involvement identificationand maintenance by the analysis systems.

Once the system, in one embodiment, receives a response record from theresponse manager, it determines which blog site contains the originalmessage, and the link to the response page for that site and message. Ifthe system, in one embodiment, has never written a response to that sitebefore, the record is delivered to the response interactor UI, which isrun by company employees. This UI allows an employee to visit theappropriate site, navigate to the appropriate fields, and assign theinformation from the record to fields on the site that will cause thesite to record a response. This action is recorded, and converted into ascript for later re-use. If TruCast has already written a response to agiven site, this script will be used eliminating the need for repeatedhuman interaction.

This system utilizes a similar engine and scripting methodology as thecollection system. Other embodiments of this solution could includepurpose built scripts that perform the same assignment and scriptedinteraction with CGM sites to perform registration and response tasks.Smaller scale systems would have users perform the manual field entryand navigation tasks, but captures these interactions for conversationinvolvement identification and maintenance by the analysis systems.

There are several sophisticated systems for preventing automatedinteraction with registration and response forms on CGM sites. BecauseTruCast is engine and script driven, and each transaction happens by wayof a modular execution system, the system can tie the process to outsidesupport modules to defeat these automation prevention systems. Theresponse automation system has a complete pop3 e-mail interaction systemwhich can generate e-mail addresses for use in registration, check thoseaddresses for incoming mail, and navigate the mail content as easily asmore typical web content. The response automation system uses advancedOCR processing along with human tuning to defeat CAPTCHA protections.The system has access to jabber protocol interactions to createautomated IM accounts and interact by SMS with mobile phone systems.TruCast also stores a significant body of information, in contact cardformat, about responders so more complex registration questions can becorrectly answered.

Conversation

The response system within TruCast delivers posts to blog sites, whichare the target for the collection system. As the system, in oneembodiment, collects content it matches incoming content to evaluate ifthat content belongs to a thread that the system has interacted with.When the system discovers posts that were written after a response thatTruCast wrote, it is returned to the queue of posts assigned to the userwho wrote the response, with a maximum priority. This way a conversationcan be facilitated. The system also allows review of conversations byway of an Audit Panel, which gives a timeline of interaction for aconversation between a blogger and a TruCast user.

Transparency

Given the volatility of the CGM space, the value it represents, and thedanger of negative publicity for any companies or other interestedparties who choose to interact by way of responding by comment, it isoptionally advantageous to maintain the appearance of correctattribution. The users are responsible for the content they generate.Because of the sophisticated analysis tools available for CGM siteowners to evaluate the source of incoming comments, it's optionallyadvantageous that the system, in one embodiment, correctly portrayscorrect attribution. While using the TruCast system to automate responsedelivery to blog sites, correct attribution of content origination isretained.

Indicators of origination include: (1) E-mail address used inregistration/response process; (2) Owner of e-mail address domain's asreported by the WHOIS information; (3) Receipt of e-mail sent to thisaddress by the correct customer to the system, in one embodiment; (4) IPAddress used in the response/registration process; (5) Reverse DNSlookup on the IP Address used in the response/registration process, andthe resultant WHOIS information; and/or (6) Internal consistency of bloguser registration information.

Any given customer or user will direct a domain name that's appropriatefor blog post response, connect this domain (and its MX record) to webaccessible server. This server should make available the e-mailaddresses hosted on it via a pop3 connection. This resolves issues 1 and2 by placing ownership of the domain from which the e-mails forregistration are generated into the hands of the users.

A forwarding system between e-mail addresses created by a robot and thee-mail address listed in the User Manager exists. Forwarding messagesfrom this TruCast controlled site to the customer's e-mail ensures thatcustomers receive any messages from bloggers that reply by e-mail. Thisresolves issue 3.

The Response Automation tool receives port 80 from the IP address usedfor the e-mail server installation, and the server hosts the ResponseAutomation Engine for use in executing the scripting that is created toperform automated response. This resolves issues 4 and 5 by aligning theIP source of the comments with the e-mail source of the comments.

The tool collects significantly more information about responders thanis typically necessary. This includes obscure information like birthdate, favorite car, mother's maiden name, favorite popsicle flavor, userpicture, etc, to ensure that registrations are complete, feature rich,and transparent. The manual response app and robots accept this data inthe response and registration steps. This resolves issue 6.

By way of this unified approach to transparency, attribution accuracyshould always be retained.

If customers or other users desire misattribution of message source, IPand e-mail anonymization features can be enabled. This obfuscates thesource of output messages by way of a rotating IP proxy environmentwhich leverages P2P and onion topologies for maximum opacity.

Administration

It is valuable to keep blog-focused workers on message, sayingappropriate things, making persuasive arguments, and being considerateparticipants in the community. In order to facilitate this, the system,in one embodiment, has a set of authorization features. Administratorshave access to a per-user toggle which forces the posts that users writeto be delivered to a review queue instead of the response automationsystem when they press the “send” button. This queue is accessible byadministrators to allow review, editing, or rejection before messagesare submitted.

Administrators can also create and manipulate sorting rules whichprioritize content within user scoring and response queues based ontopic, site, engine, author, and date information. This forces users towork on appropriate content, and allows administrators to segmentscoring and responding tasks to SME's who have the most context for agiven topic, site, engine or author.

Accounts

Users in the system, in one embodiment, are members of accounts, andafforded permissions within the system based on the role assigned tothem by administrative users on a per account basis. Roles are pre-boundpermission sets. Administrators can create, edit, and delete everythingwithin the system, except accounts. Group administrators, who haveaccess to multiple accounts, can create accounts, and can edit anddelete accounts that they've created or been given access to. Systemadministrators can add, edit, and delete all accounts, so thispermission role is reserved for internal support use only. Users withinthe system, in one embodiment, are intended to perform the majority ofthe scoring and responding work, and as such have only access to thescoring manager, response manager, and their own user manager to reviewtheir own performance. Group users can do these tasks for multipleassigned accounts. Viewers within the system, in one embodiment, haveread only access to all UI controls. Group Viewers can review multipleaccounts. Accounts as a whole can be enabled or disabled, which blocksusers from accessing the system if their account is disabled, and stopsany account specific collection, analysis or processing tasks.

FIG. 15 shows an example screenshot of the user interface homepage 1300.The homepage 1300 enables a user to navigate through the differentfunctions of the UI. The toolbar is located at the bottom of the screenand features two menus (account menu and control panel) and a row ofeight icons: Account Manager 1305, User Manager 1310, Topic Manager1315, Sorting 1320, Scoring Manager 1325, Response Manager 1330,Dashboards 1335, and Reporting 1340. The account manager 1305 is used tocreate/set-up accounts and deactivate/reactivate accounts. The usermanager 1310 is used to set-up/create users, establish group rights andpermissions, and to review user activity. The topic manager 1315 is usedto set-up/create topics and to set-up/create key phrases. Sorting, 1320,is used to set-up/create scoring and responding rules for a topic, site,author, engine, and/or date and assign rules to a specific user. Thescoring manager 1325 is used to read/score posts and create new topicswhile scoring a post. The response manager 1330 respond to posts in nearreal time and create/save personas and pre-determined responses.Dashboards 1335 is used to map and graph sentiment, impact, authorityand data. Reporting, 1340 is used to display statistical charts. Finallya control panel 1345 is used to log out of TruCast and allows email tobe sent directly to user support.

FIG. 16 shows an example screenshot of the account manager 1305. Theaccount manager is accessed by a user through button 1305 in FIG. 15.The account manager 1305 creates and manages accounts in TruCast.Accounts serve as the logical groups of related users, topics, and othersystem elements. This creation action establishes a new GUID identifiedaccountID that is used by the backend systems to identify data pertinentto this account. Account is often synonymous with customer for TruCast.

FIG. 17 shows an example screenshot of the user manager 1310. The usermanager 1310 allows administrators to set-up users, to assign specificrights/permissions to them and to evaluate their activity in TruCast.This is how a work team is created to address a specific target issuewithin the CGM space. Each new user is assigned a userID value to tracktheir activities, and identify their actions at the database level,enforce permissions and limit access. All users who login to TruCastalready have a userID. The response authorization required flagdetermines if a user's responses need to be approved by an administratorvia the authorization system.

FIG. 18 shows an example screenshot of a topic manager 1315. FIG. 19shows an example screenshot of the topic manager 1315 with theKeyphrases tab activated. The Topic Manager 1315 is where administratorsdefine topic titles, create topic descriptions, determine key phrases,and exclude specific phrases from the assigned topic. This willdetermine the content that is matched by the keyphrase tier of relevanceanalysis in TruCast. Topics are also analysis points, so they're usedlater to compare and contrast in the visualization systems. Each topicand keyphrase has a GUID value distinguishing it within the databasesystems.

FIG. 20 shows an example screenshot of a sorting manager 1320. FIG. 21shows an example screenshot of the sorting manager with the users tabactivated. Sorting 1320 enables administrators to define scoring andresponding guidelines. Administrators can create rules that eitherimpacts all users or a specific user's scoring or responding queue willbe sorted. These sorts impact the queue by matching, so all posts thatmatch the rule are sorted to the top of the queue, which allows users toscore items that are of general importance after completing scoring theposts that were specifically assigned to them by an administrator.

FIG. 22 shows an example screenshot of a scoring manager 1325. Theanalysis system, having determined that a post matching either keyphraseor conceptual categorization, filtered by the sorting system, deliversposts in a sequential queue to the scoring system. Scoring is thecentral method for users to impact the function of the automatedsystems, providing examples and context for their operation and it's theshortest path for a post to make it from ingestion to visualizations andresponse. The post is placed in text box 2005, the topics that the postrelates to are in box 2010, which a user will rate using the radiobuttons presented. Finally the site information related to the post isplaced in box 2015.

FIG. 23 shows an example screenshot of creating a new topic 2110 in thescoring manager 1325. Because pre-determined topics may not cover thescope or issues that exist in the discussion discovered by TruCast,TruCast allows scoring users to create topics, in the new topic text box2110, on the fly to capture the observation that a new loci ofdiscussion exists. These topics are not populated with keyphrases atthis step. Instead, administrators have the capability to merge anddelete topics from the Topic manager to ensure that all the team memberswho may have simultaneously discovered this new topic can receivedirection from the administrator as to what the final topic title willbe, and instructions by way of descriptions and scoring rules about howto interpret it.

FIG. 24 shows an example screenshot of a response manager 1330. Theoutput from the analysis system and the scoring manager 1325 feed intothe response manager 1330 based on applicable sorting rules as assignedby administrators. Writing the response, in block 2210, and clicking“post” is all that's required, in one embodiment, to ensure that themessage you typed makes it out as a comment on the target site. Yourwriting process is supported by significant contextual information, fromthe topic relevance and sentiment score information to stats about theoriginal author and the site they posted on. Once you submit oneresponse, the next item for your review is available immediately in thesame panel, no need to navigate to other pages or sites to find the nextplace to communicate.

FIG. 25 shows an example screenshot of an administrative queue 2300. Theadministrative queue tools allow administrators to exercise control overUser response activities. These queues can be used for managerialoversight, legal review, tactical analysis, training, feedback andperformance auditing. They create the framework for administrativeauthority over the response process.

FIG. 26 is an example screenshot of a dashboard manager 1335. TheDashboard displays data in dynamic graphical charts and graphs. It mapsand reports information based on impact, sentiment, authority and data.This allows users to easily identify critical issues, compare topics ofdiscussion for volume, breadth, depth, tone and interconnectedness ofCGM discussions, as well as other useful insights about the CGM space.

FIG. 27 is an example screenshot of an Impact Dashboard 2500 and itrefers to a set of three line graphs which show daily totals over timethat depicts the breadth, depth, and participation of the discussionscontained within one or many topics. This information is combined with apolar chart that shows the combined values of the three graphs for oneperiod.

FIG. 28 is an example screenshot of a Sentiment Dashboard: refers to asnapshot view of a single period, showing the relative post volumeversus the average sentiment of your selected topics. FIG. 29 is anexample screenshot of a Sentiment History Dashboard. This display isconnected to a history view which displays this information over time.

FIG. 30 is an example screenshot of an Authority Map Dashboard: refersto a node and edge style interactive display which shows theinterconnectedness and relative authority of individual authors within agiven topic. It shows topic as the center node, sites that containrelevant content as first edge nodes, and authors as second edge nodes.Edges between authors connote comments, links, quotes, and trackbacks asmethods of identifying connection and communication. A list view on theright side of the screen allows you to quickly find specific authors orsites within the display. Adjustable level of depth controls allow usersto establish constraints (show only authors with more than 2 links, showonly positive authors, etc.) that effect the visibility of nodes in thedisplay.

FIG. 31 is an example screenshot of a Data Dashboard: refers to adisplay that shows a tabular result set of posts that matched the topicsselected. This table shows one post per row with columns for date,author name, permalink, site name, sentiment, and topic. This view canshow only information based on keyphrase-relevance, or full analyzedrelevance, or show those two together. In several other dashboards thereare links to more information about a given topic or author. Those linkspoint to this display.

FIG. 32 is an example screenshot of an Ecosystem Map: refers to anEcosystem map is a node and edge style display of all of the sites thatmake up the discussion ecosystem for a given topic or topics. It shows anode for each site that contains posts or comments relevant to thetopics selected and date ranges selected in the dashboard launcherpanel. Between nodes, there should be an edge for each link thatconnects nodes together.

FIGS. 31-32 show example screenshots of Nodes and are size scaleddepending on how many posts/relevant posts they have, and colored byaverage sentiment. Edges are thicker depending on how many links existbetween two nodes, and have size scaled arrows showing the predominantdirection or ratio of links. Nodes, if clicked on should show the sitename, # of posts total, # of relevant posts, and sentiment %. The nameis a hyperlink to the site. By selecting an individual topic a moredetailed display with the sites and authors most important to a giventopic displayed. Double click on the node would lead to the datadashboard with a list of all the titles and permalinks to the relevantposts on that site. Edges, if clicked on, show the # of linksrepresented, % directionality.

FIG. 35 is an example screenshot of a Sentiment Summary: refers to asingle topic display that shows the number of authors per sentimentalcategory on a given topic or sum of topics.

FIG. 36 is an example screenshot of Top Lists: This provides users witha set of ranked lists of sites, authors, and posts that are the mostrelevant, most popular, most negative, most positive, mostauthoritative, most influential, most linked to, most commented on, ormost responded to depending on user selection.

FIG. 37 is an example screenshot of a Reporting: The reporting systemprovides a series of charts based on selection criteria revolving aroundCGM content. Daily or total values of posts by keyphrase match orpost-analysis match, per topic or topics, site, author, by date range.Performance metrics on scorers and responders are also available, persite, topic, or date range.

FIG. 38 is an example screenshot of an Aggregate Performance Dashboard:This dashboard supplies a cluster of configurable widgets for trackingthe relationships between several KPI's associated with the dataavailable within TruCast, in one embodiment.

While the preferred embodiment of the invention has been illustrated anddescribed, as noted above, many changes can be made without departingfrom the spirit and scope of the invention. Accordingly, the scope ofthe invention is not limited by the disclosure of the preferredembodiment. Instead, the invention should be determined entirely byreference to the claims that follow.

1. A method for reputation management comprising: collecting consumergenerated media content using a plurality of robots to parse web sitecode into XML encoded text; presenting an interactive graphical userinterface to a user contain consumer generated media content; andreplying to the consumer generated media in the graphical userinterface, wherein when submitted, a robot, programmed to interact withthe predefined consumer generated media website as a user, posts theresponse.
 2. The method of claim 1, further comprising: filteringcollected consumer generated media using a predefined fuzzy logiccomparison between existing stored content and incoming collectedcontent.
 3. The method of claim 2, further comprising: determining theconceptual relevance of at least one collected consumer generated mediapost.
 4. The method of claim 3, further comprising: prompting a userwith at least one collected consumer generated media post; and inputtinga conceptual category.
 5. The method of claim 4, further comprising:statistically analyzing user inputted conceptual categorization; andautomatically, using the statistical analysis, determining a conceptualcategory for at least one collected consumer generated media post. 6.The method of claim 5, further comprising: outputting a visualization toa user interface to present at least one of statistical, relational andgraphical information to a user.
 7. The method of claim 6, furthercomprising: forwarding at least one collected consumer generated mediapost to a user interface, in order to facilitate the writing of aresponse.
 8. The method of claim 7, further comprising: displaying auser interface configured to access a response page from which at leastone collected consumer generated media post was received; navigating atleast one field; assigning each of the fields an identifier; anddynamically creating a robot using prewritten code and inputting aresponse page location and at least one identifier from that location.9. The method of claim 8, further comprising: forwarding a responsethrough a supervisor.
 10. The method of claim 9, further comprising:tracking a location where a response was posted; and transmitting analert to a user interface.
 11. The method of claim 9, wherein consumergenerated media is a weblog.
 12. A method for responding to a sitehaving consumer generated media comprising: configuring a software robotto interact with at least one field on a site; providing the softwarerobot with predefined authentication protocols; responding to at leastone collected consumer generated media post in a user interface; andposting the response, using the robot to login into the site and postthe response.
 13. The method of claim 12, further comprising: displayinga user interface configured to access a response page from which atleast one collected consumer generated media post was received;navigating at least one field; assigning each of the fields anidentifier; and dynamically creating a robot using prewritten code andinputting a response page location and at least one identifier from thatlocation.
 14. The method of claim 13, further comprising: forwarding aresponse through a supervisor.
 15. The method of claim 14, furthercomprising: tracking a location where a response was posted; andtransmitting an alert to a user interface.
 16. A system for searching aplurality of data products, the system comprising: a database configuredto store at least one collected consumer generated media post; adisplay; and a processor in data communication with the display and withthe database, the processor comprising: a first component configured tocollect consumer generated media content using a plurality of robots toparse website code into XML encoded text; a second component configuredto present an interactive graphical user interface to a user containconsumer generated media content; and a third component configured toreply to the consumer generated media in the graphical user interface,wherein when submitted, a robot, programmed to interact with thepredefined consumer generated media website as a user, posts theresponse; wherein the components are located on at least one of a standalone computer or a plurality of computers coupled to a network.
 17. Thesystem of claim 16, further comprising: a fourth component configured todisplay a user interface and further configured to access a responsepage from which at least one collected consumer generated media post wasreceived; a fifth component configured to navigate at least one field; asixth component configured to assign each of the fields an identifier;and a seventh component configured to dynamically create a robot usingprewritten code and input a response page location and at least oneidentifier from that location.
 18. The system of claim 17, furthercomprising: an eighth component configured to forward a response througha supervisor.
 19. The system of claim 18, further comprising: a ninthcomponent configured to track a location where a response was posted;and a tenth component configured to transmit an alert to a userinterface.
 20. A method for reputation management by interacting withconsumer generated media stored in a digital location implemented by atleast one computer, the method comprising: discovering consumergenerated media using a plurality of keywords form a set of keywordsconfigured to return consumer generated media embedded in a digitallocation; collecting consumer generated media from a plurality ofsources using a plurality of robots configured to collect media relatedto a predetermined topic; testing the collected consumer generated mediafor conceptual relevance to the predetermined topic using the series ofkey words and storing a relevance factor; determining a sentiment of thecollected consumer generated media based on the semantics of thelanguage in the collected consumer generated media; outputting thecollected consumer generated media to a user interface sorted based onthe determined sentiment and the relevance factor; and replying, usingthe user interface, to a selected subset of the collected consumergenerated media further comprising: drafting a response in the userinterface to the selected collected consumer generated media; sendingthe response to an approval authority; and posting the response onapproval by the approval authority to the selected collected consumergenerated media, wherein, a robot, programmed to determine the source ofthe media and further programmed to interact with the source of theconsumer generated media website as a user, posts the response.